The Author-Topic Model for Authors and Documents

نویسندگان

  • Michal Rosen-Zvi
  • Thomas L. Griffiths
  • Mark Steyvers
  • Padhraic Smyth
چکیده

We introduce the author-topic model, a generative model for documents that extends Latent Dirichlet Allocation (LDA; Blei, Ng, & Jordan, 2003) to include authorship information. Each author is associated with a multinomial distribution over topics and each topic is associated with a multinomial distribution over words. A document with multiple authors is modeled as a distribution over topics that is a mixture of the distributions associated with the authors. We apply the model to a collection of 1,700 NIPS conference papers and 160,000 CiteSeer abstracts. Exact inference is intractable for these datasets and we use Gibbs sampling to estimate the topic and author distributions. We compare the performance with two other generative models for documents, which are special cases of the author-topic model: LDA (a topic model) and a simple author model in which each author is associated with a distribution over words rather than a distribution over topics. We show topics recovered by the authortopic model, and demonstrate applications to computing similarity between authors and entropy of author output.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating the Impact of Authors’ Rank in Bibliographic Networks on Expertise Retrieval

Background and Aim: this research investigates the impact of authors’ rank in Bibliographic networks on document-centered model of Expertise Retrieval. Its purpose is to find out what kind of authors’ ranking in bibliographic networks can improve the performance of document-centered model.   Methodology: Current research is an experimental one. To operationalize research goals, a new test colle...

متن کامل

Exploiting Temporal Authors Interests via Temporal-Author-Topic Modeling

This paper addresses the problem of discovering temporal authors interests. Traditionally some approaches used stylistic features or graph connectivity and ignored semantics-based intrinsic structure of words present between documents, while previous topic modeling approaches considered semantics without time factor, which is against the spirits of writing. We present Temporal-Author-Topic (TAT...

متن کامل

Drawing Co-Citation Networks of Corona Virus Studies

Background and Aim: The purpose of the present study is to map the coronavirus domain citation network to better understand this domain based on all other citation networks.  Materials and Methods: The present study is applied in terms of purpose, and is descriptive scientometrics in terms of type, which has been done with the all-citation method. In this study, all scientific publications on ...

متن کامل

The Author-Topic Model and the author prediction

The author-topic model is a generative model for documents that extends Latent Dirichilet Allocation to include authorship information, which is proposed by Michal Rosen-Zvi et al. The model connects each author to a multinomial distribution over topics and associated each topic with a words’ multinomial distribution. A document with multiple authors is modeled as a distribution over topics tha...

متن کامل

Experts’ Retrieval with Multiword-Enhanced Author Topic Model

In this paper, we propose a multiwordenhanced author topic model that clusters authors with similar interests and expertise, and apply it to an information retrieval system that returns a ranked list of authors related to a keyword. For example, we can retrieve Eugene Charniak via search for statistical parsing. The existing works on author topic modeling assume a “bag-of-words” representation....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004